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ABSTRACT 

We  consider  the  problem  of  setting  approximate  confidence  intervals  for  a  single  parameter  $  in 
a  multiparameter  family.  The  standard  approximate  intervals  based  on  maximum  likelihood 
theory,  $  ±  can  be  quite  misleading  so,  in  practice,  tricks  based  on  transformations, 

bias,  corrections,  etc.,  are  often  used  to  improve  their  accuracy.  The  bootstrap  confidence 
intervals  discussed  in  this  paper  automatically  incorporate  such  tricks  without  requiring  the 
statistician  to  think  them  through  for  each  new  application,  at  the  price  of  a  considerable 
increase  in  computational  effort.  In  addition  to  parametric  families,  bootstrap  interval  are 
also  developed  for  nonparametric  situations. 
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Better  Bootstrap  Confidence  Intervals 


Bradley  Efron 


1.  Introduction. 

This  paper  concerns  setting  approximate  confidence  intervals  for  a  real- 
valued  parameter  0  ,  in  a  multi-parameter  family.  The  nonparametric  case,  where 
the  number  of  nuisance  parameters  if  infinite,  is  also  considered.  The  word 
"approximate"  is  important  because  in  only  a  few  special  situations  can  exact 
confidence  intervals  be  constructed.  Table  1  shows  one  such  situation:  the  data 
(y  1 , y2)  is  bivariate  normal  with  unknown  mean  vector  (n-p^)  >  covariance  matrix 
the  identity;  the  parameters  of  interest  are  0  =  r,2/r,l  ’  and  also  ^  = 

Fieller*  s  construction  (1954)  gives  central  90%  interval  (5%  error  in  each  tail) 
of  [.29,. 76]  for  0  ,  having  observed  y  =  (8,4).  The  corresponding  interval 
for  (f>  =  1/0  is  the  obvious  mapping  (j>  £  (1/. 76,1/. 29]  . 

for  0  (R/L)  for  $  (R/L) 

Exact  Interval  [.29, .76]  (1.21)  [1.32,3.50]  (2. 20) 

Standard  Approximation  (1.1)  [ .27, .73]  (1.00)  [1.08,2.92]  (1.00) 

MLE  0  =  .5  <t>  =  2 

Table  1.  Central  90%  confidence  intervals  for  0  =  r^/1^  and  for  $  =  1/0, 
having  observed  (y^,y 2)  ~  C8 ,4)  from  a  bivariate  normal  distribution 
y  -  N2(n,I) -  The  exact  intervals  are  based  on  Fieller's  construction. 

R/L  =  ratio  of  right  side  of  interval,  measured  from  the  MLE,  to  the  left 
side.  The  exact  intervals  are  markedly  asymmetric. 

Table  1  also  shows  the  standard  approximate  intervals 

0  ,  [e«zw‘.  ewz*1-^]  ,  (1.1) 

where  0  is  the  maximum  1 ikelihood  estimate  (MLE)  of  0,0  is  an  estimate  of 
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its  standard  deviation,  often  based  on  derivatives  of  the  log  likelihood  function, 
fal  . 

and  z^  is  the  100*ot  percentile  point  of  a  standard  normal  variate.  In  Table 
1,  a  =  .05,  and  z ^  =  >1.645, 

The  standard  intervals  (1.1)  are  extremely  useful  in  statistical  practice  be¬ 
cause  they  can  be  applied  in  an  automatic  way  to  almost  any  parametric  situation. 
However  they  can  be  far  from  perfect,  as  the  results  for  <j>  show.  Not  only  is 
the  standard  interval  for  <J>  quite  different  from  the  exact  interval,  it  is  not 
even  the  obvious  transformation  [1/. 73,1/. 27]  of  the  standard  interval  for  0. 

Approximate  confidence  intervals  based  on  bootstrap  computations  were  in¬ 
troduced  by  the  author  (1981,1982).  Like  the  standard  intervals,  these  can  be 
applied  automatically  to  almost  any  situation,  though  at  greater  computational 
expense  than  (1.1).  Unlike  (1.1),  the  bootstrap  intervals  transform  correctly, 
so  for  example  the  interval  for  4>  =  1/ 0  is  obtained  by  inverting  the  endpoints 
of  the  interval  for  0.  They  also  tend  to  be  more  accurate  than  the  standard 
intervals.  In  the  situation  of  Table  1,  the  bootstrap  intervals  agree  with  the 
exact  intervals  to  three  decimal  places.  Efron  (1984)  shows  that  this  is  no 
accident;  there  is  a  wide  class  of  problems  for  which  the  bootstrap  intervals 
are  an  order  of  magnitude  more  accurate  than  the  standard  intervals. 

In  those  problems  where  exact  confidence  limits  exist,  the  endpoints  are 
typically  of  the  form 


0  +  crlz 


(a) 


t(°0  B(c0 


(1.2) 


where  n  is  the  sample  size.  The  standard  intervals  (1.1)  are  first  order  correct 

^  ^  (ot) 

in  the  sense  that  the  term  0+oz  asymptotically  dominates  (1,2).  However  the 
second  order  term  aA^0^/1^  can  have  a  major  effect  in  small  sample  situations. 
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It  is  this  term  which  causes  the  asymmetry  of  the  exact  intervals  about  the  MLE, 
as  seen  in  Table  1.  As  a  point  of  comparison,  the  student-t  effect  is  of  third 

order  magnitude,  comparable  to  aB^  J /n  in  (1.2).  The  bootstrap  method  des¬ 
cribed  in  Efron  (1984)  was  shown  to  be  second  order  correct  in  a  certain  class 
of  problems,  automatically  producing  intervals  of  correct  second  order  asympto- 

Ac«) 

^  ^  f  (y!  n 

tic  form  0  +  a(z ^  '  +  -  +...). 

/K  . 

This  paper  describes  an  improved  bootstrap  method  which  is  second  order 
correct  in  a  wider  class  of  problems.  This  wider  class  includes  all  the  familiar 
parametric  examples  where  there  are  no  nuisance  parameters,  and  where  the  data  has 
been  reduced  to  a  one-dimensional  summary  statistic,  with  asymptotic  properties 
of  the  usual  MLE  form.  (Section  4) . 

The  bootstrap  methods  described  here  apply  to  either  parametric  or  non- 
parametric  situations.  We  will  begin  with  the  simplest  parametric  situations 
and  work  toward  the  full  nonparametric  case  near  the  middle  of  the  paper.  In 
order  to  get  the  main  ideas  across,  some  important  technical  points  are  deferred 
to  the  later  Sections. 

2.  Bootstrap  Confidence  Intervals. 

This  section  describes  the  construction  of  improved  bootstrap  confidence 
intervals.  They  are  obtained  by  a  simple  modification  of  the  bias-corrected 
bootstrap  method,  (BC  Method),  introduced  in  Efron  (1981,1982).  First  we 
describe  the  BC  method. 

Let  y  represent  all  the  available  data,  and  suppose  that  y  is  drawn 
from  an  unknown  probability  distribution  P  ,  .belonging  to  a  known  family  of 
distributions  P.  A  familiar  example  is  where  y  =  (x^,X2> . . • ,x^)  ,  the  x^ 
being  obtained  by  independent  identical  draws  (i.i.d.)  from  a  distribution 
which  belongs  to  a  family  T  indexed  by  an  unknown  parameter  vector  r\. 
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There  is  a  real -valued  parameter  0  =  t(P)  for  which  we  have  a  point  esti- 

A 

mate,  say  0  =  s(y)  ,  but  further  desire  a  confidence  interval.  All  bootstrap 
methods  begin  as  follows:  Having  observed  y  ,  we  first  estimate  P  by  some 

A  /N 

estimation  rule  P  =  P(y).  In  the  i.i.d.  case  for  instance,  we  might  estimate 

^  n  • 

r]  by  its  MLE  n  ,  and  then  take  P  =  F^  ,  the  exponent  indicating  n  indepen¬ 
dent  draws  of  x^  from  F^.  In  the  nonparametric  situation,  where 
y  =  (x^,X2> - . - , but  the  i.i.d.  observations  x^  can  come  from  any  distribu¬ 
tion  F  on  their  sample  space,  we  would  usually  take  P  =  Fn  ,  where  F  is  the 
empirical  distribution  putting  mass  1/n  on  each  observed  value  x^.  This  is 
the  original  context  in  which  the  bootstrap  was  suggested.  However  in  most  of 
this  paper,  except  for  Sections  6  and  7,  we  will  be  working  with  the  parametric 
bootstrap. 

The  BC  method  produces  an  approximate  l-2a  central  confidence  interval 

A 

for  0  by  resampling  from  P  : 

*  *  *  ^ 

(i)  Independent  bootstrap  data  sets  y^ ,  y«,  ...,  yD  are  drawn  from  P. 

"W  1  'vi  -vD 

(The  number  of  resamples  for  confidence  interval  construction  tends  to  be  large, 
on  the  order  of  B  =  1000,  see  Section  8.  We  will  assume  in  what  follows  that 
B  is  very  large,  so  that  fluctuations  in  the  bootstrap  intervals  due  to  small 
B  are  eliminated.  In  parametric  situations  B  can  often  be  made  essentially 
infinite  by  using  standard  parametric  expansions,  instead  of  Monte  Carlo  samp¬ 
ling,  to  construct  the  bootstrap  distribution,  see  Efron  (1984,  1984A).) 

*  A*  * 

(ii)  For  each  y^,  b  =  1,  2,  ...,  B  ,  the  bootstrap  estimate  0^  =  s(y^) 
is  calculated. 

(iii)  The  bootstrap  cumulative  distribution  function  (cdf)  of  the  0 
values  is  constructed,  say 

#{eNs} 

G(s)  =  - g— -  -00  <  s  <  00  .  (2.1) 
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Civ)  The  quantity 


z0  =  ^(Gfe)) 


(2.2) 


is  evaluated,  $(z)  =  (2t r)' 


i  2 


ds  ,  the  standard  normal  cdf. 


(v)  Finally,  the  BC  interval  is  defined  to  be 


e  e  [G"1($(2z0+zCa))),  G-1($(2z0+z(1_O°))]  ,  (2.3) 

where  z^  =  $  ^(a)  as  before. 

A  A 

Notice  that  if  G(0)  =  .5  ,  that  is  if  half  of  the  0^  values  are  less  than 

/\ 

the  actual  estimate  0  ,  then  z^  =  0  ,  and  the  BC  interval  (2.3)  is  simply 
^-1  *-l 

0  £  [G  (a),  G  (1-a)].  In  other  words,  we  use  the  obvious  percentiles  of  the 
bootstrap  distribution  of  0  to  form  an  approximate  confidence  interval  for  0. 
If  Zq  /  0  ,  definition  (2.3)  makes  a  bias  correction,  often  a  quite  large  one, 
as  motivated  in  Section  10.7  of  Efron  (1982). 

For  the  situation  of  Table  1,  the  BC  method  produces  intervals  agreeing  very 
closely  with  the  exact  Fieller  solution.  Table  2  shows  a  less  successful  applica¬ 
tion,  pointed  out  by  N.  Schenker  (1983).  The  data  is  the  single  observation 
2 

y  -  0y^g  >  and  a  confidence  interval  is  desired  for  the  scale  parameter  0.  In 

A 

this  case  the  BC  interval  based  on  0  =  y/19  is  a  definite  improvement  over  the 
standard  interval  (1.1),  but  goes  only  about  half  as  far  as  it  should  toward 
achieving  the  asymmetry  of  the  exact  interval,  0£0[19/x2q*  ,*  19/x^0^ ]  . 

Why  does  the  BC  method  work  better  in  Table  1  than  in  Table  2?  The  main 
result  of  Efron  (1984)  is  the  following,  which  applies  to  Table  1:  suppose  y 
is  multivariate  normal  y  -  N^(ri,I),  0  =  t(q)  ,  and  we  estimate  0  by  its  MLE 

A 

0  =  t(y).  Then  the  BC  method  is  second  order  correct,  as  defined  in  Section  1. 
More  generally,  if  there  exists  multivariate  transformations  g  and  h  such 
that  g(y)  -  Nk(h(n),I)  ,  then  the  BC  interval  for  0  based  on  the  MLE  0  is 


S 


1. 

Exact 

9[. 631, 1.88] 

R/L  = 

2.38 

2. 

BC 

§[.580,1.69] 

R/L  = 

1.64 

3. 

Standard  (1.1) 

§[.466,1.53] 

R/L  = 

1.00 

4. 

BC. 1077 

§[.630,1.88] 

R/L  = 

2.37 

5. 

Nonparametric  BCa 

0[. 640, 1.68] 

R/L  = 

1.88 

Table  2.  Central  90%  confidence  intervals  for  0,  having  observed 

2  ^ 
y  -  ®Xig*  The  BC  method,  based  on  the  MLE  0  =  y/19,  (line  2),  is  only 

a  partial  improvement  over  the  standard  intervals.  The  improved  boot¬ 
strap  method  described  in  this  section  (line  4)  agrees  almost  perfectly 
with  the  exact  interval.  Its  nonparametric  version  (line  5)  is  dis¬ 
cussed  in  Section  6. 


still  second  order  correct.  This  last  result  does  not  require  the  statistician 
to  know  the  normalizing  transformations  g  and  h  ,  only  that  they  exist  (because 

A 

the  BC  interval  (2.3)  is  invariant  under  all  such  transformations,  when  0  is  the 
MLE)  . 

2  ~  2 

The  situation  of  Table  2  is  y  -  0x^9  »  or  equivalently  0  ^  0(x^/19).  The 
results  of  Efron  (1982A)  show  that  there  is  a  single  monotone  transformation  g 

A  XV 

such  that,  to  a  good  approximation,  <f>  =  g(0),  <t>  =  g(0)  satisfy 

$  =  <}>  +  ^(Z-z^  ,  (Z  ~  N(0, 1) )  (2.4) 

and 


=  1  +  a<j>  .  (2.5) 

* 

The  constants  in  (2.4),  (2.5)  are  zQ  =  .1082,  a  =  ,1077.  See  Section  9,  and  Retnark 
E,  Section  10.  For  general  situations  of  form  (2.4),  (2.5),  we  will  assume  that 
( | >  >  -1/a  if  a  >  0  ,  so  >  0  ,  and  likewise  4>  <  -1/a  if  a  <  0.  The  constant 
a  will  typically  be  in  the  range  |a|  <_  .2  as  will  z^.  [Please  note  the 
Corrigenda  to  Efron  (1982A).] 

Notice  that  if  a  were  0  in  (2,5),  then  (2.4)  would  be  a  normal  transla¬ 
tion  family.  In  this  case  the  obvious  confidence  interval 
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<f>  £  [<j)+z0+z^,  $+zQ+z  ]  would  map  into  a  correct  confidence  interval  for  0, 

via  the  inverse  transformation  0  =  g_1(c|>)-  As  a  matter  of  fact  the  resulting 
interval  for  0  equals  the  BC  interval.  This  is  the  rationale  for  definition 
(2.3).  The  advantage  of  (2.3)  is  that  the  statistician  need  not  know  the  mapping 

g- 

The  improved  bootstrap  method  of  this  paper  consists  of  a  recipe  for  handling 

A 

the  situation  a  ^  0.  Suppose  0  is  a  real-valued  statistic  whose  density  fg 
depends  only  on  a  real -valued  parameter  0  ,  and  suppose  also  that  there  exists 

A  /\ 

a  monotone  mapping  g  such  that  <j)  =  g(0),  <]>  =  g(0)  satisfy  (2.4),  (2.5). 

Lemma  1.  Under  the  conditions  just  stated,  the  correct  confidence  interval 

A 

for  0  ,  based  on  0  ,  is 

0  e  [G-1($(z[a]) ,  G_1 ($(z [1-a] ) ]  ,  (2.6) 


where 


z[a]  =  zQ  + 


1  -  afZg+z^) 


(2.7) 


and  likewise  for  z[l-a].  (Proof  below.) 

Here  G  is  the  (parametric)  bootstrap  cdf  of  0  -  fg.  If  a  =  0  then 

interval  (2.6)  is  the  same  as  (2.3),  but  if  a  ^  0  ,  different  percentiles  of 
the  bootstrap  distribution  are  employed.  We  will  call  (2.6)  the  BCa  interval. 
The  constant  a  is  discussed  further  in  Sections  3  and  9. 

Example:  for  the  situation  of  Table  2,  where  z^  =  .1082,  a  =  .1077  ,  we 
calculate  z[.05]  =  -1.210,  z[.9S]  =  2.270  ,  so  *(z [a] )  =  . 1131 ,  *(z [ 1-a] )  =  . 9884 . 
The  bootstrap  distribution  is  0  ~  0(x^g/19)  ,  with  quantiles  G  (. 1131)  =  0 . 630 , 

G-1(.9884)  =  0  1.877.  This  gives  the  BC  1Q77  interval  shown  in  line  4.  [Bartlett 
(1953)  discusses  this  same  example.  The  BC&  method  can  be  thought  of  as  a  com¬ 
puter-based  way  to  numerically  carry  out  Bartlett's  program  of  improved  approxi¬ 
mate  confidence  intervals  without  having  to  do  any  theoretical  calculations.] 
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The  proof  of  the  lemma  begins  by  showing  that  the  BCa  interval  for  $  , 

A 

based  on  4>  ,  is  correct  in  a  certain  obvious  sense:  notice  that  (2.4),  (2.5) 
give 

{l+a$}  =  {l+at})}{l+a(Z-z0) }  .  (2.8) 

Taking  logarithms  puts  the  problem  into  standard  translation  form, 

£  =  £  +  W  ,  (2.9) 

8  =  log{l+acj>},  £  =  log{l+a<j>}  ,  and  W  =  log{l+a(Z-zQ) } .  This  example  is  discussed 
more  carefully  in  Sections  4  and  8  of  Efron  (1982A) ,  where  the  possibility  of 
the  bracketed  terms  in  (2.8)  being  negative  is  dealt  with.  Here  it  will  cause 
no  trouble  to  assume  them  positive  so  that  it  is  permissable  to  take  logarithms. 
In  fact  the  transformation  to  (2.9)  is  only  for  motivational  purposes.  A  quicker 
but  less  informative  proof  of  the  Lemma  is  possible  working  directly  on  the  <J> 
scale. 

The  translation  problem  (2.9)  gives  a  natural  central  l-2a  interval  for  £ 
having  observed  £  , 

£  e  [£-w(1"°°,  £-wCc°]  ,  (2.10) 

where  w^  is  the  100*a  percentile  point  for  IV,  Prob{W<w*-a-* }  =  a. 

We  will  use  the  notation  9 [a]  for  the  a-level  endpoint  of  a  confidence  in- 

^  r  i  -oti 

terval  for  a  parameter  0.  For  example  (2.10)  says  that  £[a]  =  £-w  , 

£[l-a]  =  £-w^.  The  interval  (2.10)  can  be  transformed  back  to  the  <j>  scale  by 
the  inverse  mappings  4>  =  (e^-l)/a,  c}>  =  (e^-l)/a,  (Z-Zq)  =  (e^-l)/a.  A  little 
algebraic  manipulation  shows  that  the  resulting  interval  for  <(>  has  a-level 
endpoint 
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(2.11) 


<f>[a] 


<J>  +  a 


(Zq+z1-01-1) 


<f> 


1  -  a(zQ+z 


+zCo0' 


A  s — cb 

The  cdf  of  (f>  according  to  (2.4)  is  $(-^-  +  zQ)  ,  so  the  bootstrap  cdf  of 

a  J  (j) 

$  ,  say  ,  is  G^s)  =  $(-?-x.  +  zQ)  .  This  has  inverse  G”1^)  =  $+a^{$-1  (a) -zQ}  , 

^-1  ^ 

which  shows  that  ($(z[a]))  equals  (2.11).  In  other  words,  the  BCa  interval 
for  (J)  ,  based  on  cf>  ,  coincides  with  the  correct  interval  (2.11),  correct  meaning 
in  agreement  with  the  translation  problem  interval  (2.10). 


The 

BC 

a 

intervals  transform  in  the 

obvious 

way:  if 

<t>  =  g(4>) ,  <(>  =  g(<i>)  » 

then  the 

BCa 

interval  endpoints  satisfy 

4>[a]  = 

g(S[a]) 

(because  each  bootstrap 

A*  A* 

realization  (f)^  equals  g(0^)  >  so  that  all  the  percentiles  of  the  two  bootstrap 

^-1  ^-1 

distributions  map  in  the  same  wav.  (a)  =  g(G  (a)).)  This  verifies  Lemma  1: 

a  a 

the  transformations  0  -*  (f)  £  and  0  ■>  <j)  -*•  c;  reduce  the  problem  to  translation 

form  (2.9);  the  inverse  transformations  of  the  natural  interval  (2.10)  for  £ 

produce  the  BC  interval  (2,6),  (2,7). 
a 


5.  The  Acceleration  Constant  a. 

The  BCL  intervals  (2.6),  (2.7)  require  the  statistician  to  calculate  the 

a 

A 

bootstrap  distribution  G  ,  and  also  the  two  constants  z^  and  a.  The  bias- 

_  ^  A  A 

correction  constant  z^  equals  $  (G(0))  ,  (2.2),  and  so  can  be  computed  directly 

/\ 

from  G.  What  about  a?  If  we  need  to  know  the  transformation  g  leading  to 

the  normalized  problem  (2.4),  (2.5)  in  which  a  was  defined,  then  the  BC 

*  a 

method  is  practically  useless.  Fortunately  there  is  a  simple  way  to  calculate  a 
without  knowledge  of  g. 

A 

Suppose  first  that  0  has  density  function  fg  depending  only  on  the  real 
parameter  0  ,  as  in  Lemma  1.  In  this  section  we  will  show  that  a  good  approxi¬ 
mation  for  the  constant  a  is 

.  SKEWe,§(te) 
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3/2 

where  SKEWg_g(X)  indicates  the  skewness  of  a  random  variable  X,  > 

A  • 

evaluated  at  parameter  point  0  equals  0  ,  and  &0  is  the  score  function 

V0)  =  ^  log  f0C0)  •  (3.2) 

Formula  (3.1)  allows  us  to  calculate  a  in  terms  of  the  given  density  fg  ,  with¬ 
out  knowing  g.  Sections  5  and  6  discuss  the  computation  of  a  in  families  with 
nuisance  parameters.  Section  9  gives  a  deeper  discussion  of  a  ,  and  its  rela¬ 
tionship  to  other  quantities  of  interest. 

~  2  2 
Example:  For  the  situation  0  ~  0(Xig/19)  in  Table  2,  standard  x  cal- 

•  u 

culations  give  SKEW(£q)/6  =  [8/(19-36)]  =  .1081  ,  which  is  quite  close  to  the 

actual  value  a  =  .1077  derived  in  Section  9. 

If  we  make  smooth  one-to-one  transformations  (f>  =  g(0),  $  =  h(0)  ,  then 
^(J)  =  &Q(0)/hf(0)  ,  and  SKEW(^)  =  SKEW(£Q).  In  other  words,  the  right  side 
of  (3.1)  is  invariant  under  all  mappings  of  this  type.  Suppose  that  for  some 

A 

choice  of  g  and  h  ,  we  can  represent  the  family  of  distributions  of  <J>  as 


$  =  <f>  +  cy^Z)  (Z  ~  N(0,1))  , 


(3.3) 


where  a ^  and  q(Z)  are  functions  of  <f >  and  z  ,  having  at  least  one  and  two 
derivatives  respectively.  Situation  (3.3)  is  called  a  general  scaled  transforma¬ 
tion  family  (GSTF)  in  Efron  (1982A) . 

•  A 

Lemma  2.  The  family  (3.3)  has  score  function  satisfying 


Fz  +  q"cz)l 

r1  + vcz)l 

L  <rraj 

L  q'(z)  J 

(Z  ~  N(0,1))  •  (3.4) 


do. 

+  T 

Here  a .  =  ttt-  and  q'  and  q"  are  the  first  two  derivatives  of  q. 

<p  dtp 
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Before  presenting  the  proof  of  Lemma  2,  we  note  that  it  verifies  (3.1):  in 
situation  (2.4),  (2.5),  where  =  a,  q!(Z)  =  1,  q"(Z)  =  0  ,  the  distributional 
relationship  (3.4)  becomes 

<w«  ~  ♦  T^r0  t22-*)]  ■  (3-5) 

Let 


S0  1-az^ 


(3.6) 


a  quantity  discussed  in  Section  9.  From  the  moments  of  Z 


Mrn  i  ^ 


SKEW^) 

6 


1  + 


4  2 
3  £0 


C1+2£q)3/2 


(3.7) 


We  will  see  in  Section  9  that  for  the  usual  repeated  sampling  situation 
both  a  and  are  order  of  magnitude  0(n~2)  in  the  sample  size  n.  This 

means  that  eQ  =  a*[l+0(n-1)]  ,  (3.6),  and  that  SKEW(£Q) /6  =  SKEWf^) /6 =  a[ l+0(n_1)  ]  , 
(3.7),  justifying  approximation  (3.1).  The  "constant"  a  actually  depends  on  0  , 

A 

but  substituting  9=0  in  (3.1)  causes  errors  only  at  the  third  order  level, 

aB^/n  in  (1.2),  and  so  doesn't  effect  the  second  order  properties  of  the  BC 
n  s- 

intervals . 

Proof  of  Lemma  2:  Starting  from  (3.3),  the  cdf  of  $  is  $(q-1(£li))  so 
~  -%Z2  ^  % 

4  has  density  f^Ofr)  =  e  Y(/2?  a ^  q'  (Z^) )  ,  where  =  q_1  ( ($-4>)/cf^)  .  This 
gives  log  likelihood  function 


y$3  =  ~jzI  -  ^s^'cy3  ■  log(V  •  C3-8) 

Lemma  2  follows  by  differentiating  (3.8)  with  respect  to  <j>  ,  and  noting  that 
~  N(0,1)  when  sampling  from  (3.3)  1 
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Suppose  Zq  =  0  and  a  >  0  in  (2.4) ,  (2.5) .  Having  observed  cf>  =  0  ,  and 

noticing  0|  =  1  ,  the  naive  interval  for  cj>  (which  is  almost  the  same  as  the 

standard  interval  (1.1)),  is  cf)  e  [z^a\z^  °^].  However  if  the  statistician 

checks  the  situation  at  the  right  endpoint  z^~°^  ,  he  finds  that  the  hypothe- 

^  (1-al 

sized  standard  deviation  of  <J>  has  increased  from  1  to  1+az^  .  This  sug¬ 

gests  increasing  the  right  endpoint  to  z^  °^(l+az^  °^).  Now  the  hypothesized 
standard  deviation  has  further  increased  to  1+az (1+az ^  °^)  ,  suggesting 
a  still  larger  right  endpoint,  etc.  Continuing  in  this  way  leads  to  formula 
(2.11).  [Improving  the  standard  interval  (1.1)  by  recomputing  a  at  its  end¬ 
points  is  a  useful  idea.  It  was  brought  to  my  attention  by  John  Tukey,  who 
pointed  out  its  use  by  Bartlett  (1953),  see  for  instance  Bartlett fs  equation 
(17).  Tukey's  1949  unpublished  talk  anticipates  many  of  the  same  points.] 

We  will  call  a  the  acceleration  constant  in  what  follows  because  of  its 
effect  of  constantly  changing  the  natural  units  of  measurement  as  we  move  along 
the  cf>  (or  0)  axis.  Notice  that  we  can  write  (2.5)  as 


r  ^oi 

=  a .  1  +  a  - -  , 

<J>n  L  a.  J 


(3.9) 


SO 


a  = 


dcvv 


d(- 


(3.10) 


0. 


for  any  fixed  value  of  4>q.  This  shows  that  a  is  the  relative  change  in  a ^ 
per  unit  standard  deviation  change  in  <p  ,  no  matter  what  value  <f>  has. 

The  point  $0=0  is  favored  in  definition  (2.5)  since  we  have  set  Oq 
equal  to  the  convenient  value  1.  There  is  no  harm  in  thinking  of  0  as  the  true 

'  A 

value  of  <}>  ,  the  value  actually  governing  the  distribution  of  <p  in  (2.4), 
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because  in  theory  we  can  always  choose  the  transformation  g  so  that  this  is  the 
case,  and  also  so  that  =  1.  (See  Remark  A,  Section  10).  The  restriction 

l+a<j)  >  0  in  (2.5)  causes  no  practical  trouble  for  |a|  £  .2  ,  since  it  is  then 
at  least  5  standard  deviations  to  the  boundary  of  the  permissable  cj>  region. 


4.  Second  Order  Correctness  of  the  BC  Intervals. 
-  a  - 

The  standard  interval  (1.1)  is  based  on  taking  literally  the  asymptotic 
approximation 

A 

~  N(°'^  •  (4.1) 

a 

The  BC  method  assumes  that  a  more  general  approximation  holds. 


for  some  constant  z^  and  monotone  transformation  g  ,  where  a 


(4.2) 

[Var0  g(0)]^|. 


The  BC0  method  relaxes  the  assumptions  one  step  further,  to 
8. 


g(9)^gC9)  ^  N(-zq, (l+ag(9))2)  .  (4.3) 

g 

The  difference  between  (4.2)  and  (4.3)  is  greater  than  it  seems:  the  hypo¬ 
thesized  ideal  transformation  g  in  (4.2)  has  to  be  both  normalizing  and  variance 
stabilizing,  while  in  (4.3),  g  need  by  only  normalizing.  Efron  (1982A)  shows 
that  normalization  and  stabilization  are  partially  antagonistic  goals  in  familiar 
families  such  as  the  Poisson  and  the  binomial. 

It  is  not  surprising  that  a  theory  based  on  (4.3)  is  usually  more  accurate 
than  a  theory  based  on  (4.1).  In  fact  applied  statisticians  make  frequent  use 
of  devices  like  those  in  (4.3),  transformations,  bias  corrections,  and  even 
acceleration  corrections,  to  improve  the  performance  of  approximation  (4.1).  The 

advantage  of  the  BC  method  is  that  it  automates  these  improvements,  so  that 

a 

the  statistician  doesn't  have  to  think  them  through  anew  for  each  new  application. 
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Is  it  possible  to  go  beyond  (4.3),  to  find  still  further  improvements  over 

(4.1) ?  The  answer  is  no,  at  least  not  in  terms  of  second-order  asymptotics.  The 
theorem  of  this  section  states  that  for  simple  one-parameter  problems  the  BC 

3- 

intervals  coincide  through  second  order  with  the  exact  intervals.  In  terms  of 

(1.2) ,  the  BCa  intervals  have  the  correct  second-order  asymptotic  form 

0  +  a(z +.  ..). 
k  n  1 

We  consider  the  simple  one-parameter  problem  0  ~  fg  ,  supposing  that  the 

^  ^  f  ct'l 

100#a  percentile  of  0  as  a  function  of  0  ,  say  0^  J  ,  is  a  continuously  increas 

d 

ing  function  of  0  ,  for  any  fixed  a.  In  this  case  the  usual  confidence 
interval  construction  gives  an  exact  l-2a  central  interval  for  0  having  ob- 

A 

served  0  ,  say  £®EXEal  *  ®Ex^”a^  *  where  0gx[a]  is  the  value  of  0  satis- 
fying  0g  J  -  0.  The  exact  interval  in  Table  2  is  an  example  of  this  construc¬ 
tion. 

It  isn't  necessary  that  0  be  the  MLE  of  0.  In  (2.4)  for  instance  (£  is 

not  the  MLE  of  cf>.  (The  BC  method  is  quite  insensitive  to  small  changes  in 

a 

the  form  of  the  estimator,  see  Remark  B,  Section  10.)  However  we  will  assume 

A 

that  0  behaves  asymptotically  like  the  MLE  in  terms  of  the  orders  of  magnitude 
of  its  bias,  standard  deviation,  skewness,  and  kurtosis. 


/v  Bq  D-  Eq 

0-0  -  -p  P  if)  • 

/n  /n 

A 

Here  n  is  the  sample  size  upon  which  the  summary  statistic  0  is  based; 

Bq,  Cq,  Dq  ,  and  EQ  are  bounded  functions  of  0  (and  of  n,  which  is  sup- 
boo  o 


(4.4) 


-1, 


pressed  in  the  notation).  Then  (4.4)  says  that  the  bias  of  0,  Bg/n  ,  is  0(n  ), 

the  standard  deviation  Cg/v'ii  is  0(n~2)  ,  skewness  0(n  2)  ,  and  kurtosis  0(n  ) 

Higher  cumulants,  which  are  typically  of  order  smaller  than  0(n~*)  ,  will  be 
assumed  negligible  in  proving  the  results  which  follow.  See  Hougaard  (1982)  and 
DiCiccio  (1984). 
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The  asymptotics  of  this  paper  are  stated  relative  to  the  size  of  the  esti- 

■  /\  /\ 

mated  standard  error  a  of  0  ,  as  in  (1,2),  It  is  often  convenient  in  what 

A 

follows  to  have  a  be  0(1).  This  is  easy  to  accomplish  by  transforming  to 

^  v\  1- 

<f>  =  v^T  0,  <j>  =  vn  0  ,  so  (4.4)  becomes 

$-4>  ~  (^,0^,6^)  ,  (4.5) 


where  3,  = 
<P 


^<})/ 

dBd> 

and  e4>  5  Hf  '  0(n" 


y4,  '  S4>  "  Et/^/n' 

)  ,  etc.  We  can  just  assume 


A 

that  0  and  0  are  the  rescaled  quantities  called  $  and  <p 
following  orders  of  magnitude  apply. 


Notice  that 

to  begin  with 
above.  Then  the 


0(1)  0(n’%)  oar1)  0(n-3/2) 

‘  1  '  T,  •  l—  •  ••  •*  • 

ae  a0’^0,Y0,<S0  ^0,Y0,(50 


(4.6) 


Theorem  1.  If  0  has  bias  0^  ,  standard  error  afi  ,  skewness  yfl  ,  and 


0 


'  0 


kurtosis  satisfying  (4.6),  then  the  BC  intervals  are  second  order  correct. 

0  a 

The  theorem  states  that  0gC  [a]  ,  the  a-endpoint  of  the  BCa  interval,  is 

a 

asymptotically  close  to  the  exact  endpoint, 


0BC  !aI  ‘  0Ex[“>  , 

—2 - - -  .  Ofn"1)  . 


(4.7) 


This  isn't  true  for  standard  intervals  (1.1)  or  the  BC  intervals  (2.3).  The 
proof  of  Theorem  1,  which  appears  in  Section  11,  makes  it  clear  that  all  three 
of  the  elements  in  (4.3),  the  transformation  g  ,  the  bias-correction  constant 
zQ  ,  and  the  acceleration  constant  a  ,  make  necessary  corrections  of  0(n  z) 
to  the  standard  intervals  based  on  (4.1). 

McCullagh  (1984)  and  Cox  (1980)  give  an  interesting  approximate  confidence 
interval  for  0  ,  having  a-endpoint 
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1 


+ 


(4.8) 


0APP[a'1 


<-N 


0  + 


(3k2+2kQ01) 


+  k001z 


6k 


3/2 


^  *  2 

Here  0  is  the  MLE  of  0;  if  k2(0)  =  ^0  ^0  »  t^ie  Fisher  information,  then 

k2  =  k2(0)  and  k2  =  dk2 (0)/d0 | Q=g  ;  and  kQ01  =  CEq^q) q=q -  Formula  (4.8)  is 
based  on  higher-order  asymptotic  approximations  to  the  distribution  of  the  MLE. 
See  also  Bamdorf-Nielsen  (1984). 

It  can  be  shown,  as  indicated  in  Section  11.  that  B  [a]  also  closely 

a 

matches  (4.8),  (0fiC  [ot]  -©^pfa]  )/a  =  0(n_1).  We  see  again  that  the  BCa  method 

a 

offers  a  way  to  avoid  theoretical  effort,  at  the  expense  of  intense  computer  com¬ 
putations. 


5 .  Multiparameter  Problems . 

The  discussion  so  far  has  centered  on  the  simple  case  0  ^  f^  ,  where  we 
have  only  a  real-valued  parameter  0  and  a  real-valued  summary  statistic  0  from 
which  we  are  trying  to  construct  a  confidence  interval  for  0.  We  have  been  able 
to  show  favorable  properties  of  the  BC  intervals  for  the  simple  case,  but  of 

a 

course  the  simple  case  is  where  we  least  need  a  general  method  like  the  bootstrap. 

This  section  discusses  the  more  difficult  situation  where  there  are  nuisance 
parameters  besides  the  parameter  of  interest  0.  Section  6  discusses  the  non- 
parametric  situation,  where  the  number  of  nuisance  parameters  is  effectively  in¬ 
finite.  Because  of  the  inherently  simple  nature  of  the  bootstrap  it  will  be  easy 
to  extend  the  BC  method  to  cover  these  cases,  though  we  will  not  be  able  to 

a 

provide  as  strong  a  justification  for  the  correctness  of  the  resulting  intervals. 

Suppose  then  that  the  data  y  comes  from  a  parametric  family  F  of  density 
functions  f  ,  where  rt  is  an  unknown  vector  of  parameters,  and  we  want  a  con¬ 
fidence  interval  for  the  real-valued  parameter  0  =  t(ri).  In  Efron  (1984),  the 
multivariate  normal  case  y  -  N^(r),I)  is  examined  in  detail. 
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From  y  we  obtain  ri  ,  the  MLE  of  n  ,  and  9  =  t(rj)  ,  the  MLE  of  0.  The 
BC  interval  for  0  ,  (2.3),  is  obtained  as  indicated  at  the  beginning  of  Section 
2:  step  (i)  of  the  bootstrap  algorithm  is  to  sample  y..,y0...yD  -  f~  ,  giving 

~1  ~D  f| 

A*  *  ~ 

0^  =  t(y^)  ,  etc.  However  in  order  to  obtain  the  BCa  intervals,  we  also  need 
to  know  the  appropriate  value  of  a  ,  the  acceleration  constant.  We  will  find  a 
by  following  Stein1 s  (1956)  construction,  which  replaces  the  multiparameter  family 

A 

F  =  {f^}  by  a  least  favorable  one-parameter  family  F. 

~  3  * 

Let  Z  be  the  vector  with  i  coordinate  —  log  f  (y)  so  £^(y)  =  0 

3n-  ri  i  ~ 

*  *  .  a  *•  th 

by  definition  of  the  MLE  ri  ,  and  let  Z^  be  the  k  x  k  matrix  with  ij  entry 

2 

3fT 3tT  '  Also  let  7  be  t^ie  gradient  vector  of  0  =  t(n)  eval- 

XI  'u  ~  ~  ^ 

uated  at  the  MLE,  V.  =  ~ —  t(ri)|  The  least  favorable  direction  at  ri  =  ri 

i  drj^  -  1  rpri  - ■ - - - 

is  defined  to  be 

0  =  .  (5.1) 

A 

Then  the  least  favorable  family  F  is  the  one-parameter  subfamily  of  F  passing 
through  f|  in  the  direction  y  , 

£  =  (fT(y*)  =  ffj+T-(y*)>  •  (5.2) 

*  * 

Using  y  to  denote  a  hypothetical  data  vector  from  f  is  intended  to  avoid  con- 

Ss  A 

fusion  with  the  actual  data  vector  y  which  gave  r\ ;  ri  and  y  are  fixed  in 
(5.2),  only  x  being  unknown. 

A  A  *  A 

Consider  the  problem  of  estimating  0(x)  =  t(n+xy)  having  observed  y  -  f  . 
The  Fisher  information  bound  for  an  unbiased  estimate  of  0  in  this  one-parameter 
family  evaluated  at  x  =  0  ,  equals  Vf(-£^)  V  ,  which  is  the  same  as  the  cor- 

A 

responding  bound  for  estimating  0  =  t(n)  ,  at  r\  =  n  ,  in  the  multiparameter 

A 

family  F.  This  is  Stein's  reason  for  calling  F  least  favorable. 
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A 

We  will  use  F  to  calculate  an  approximate  value  for  the  acceleration  con¬ 
stant  a  , 


a 


SKEW 


91og 


T=0 


dr 


(5.3) 


This  is  formula  (3.1)  applied  to  F  9  assuming  that  x  =  0  (which  is  the  MLE  of 

A 

x  in  F  when  y*  =  y  ,  the  actual  data  vector) . 

Formula  (5.3)  is  especially  simple  in  the  exponential  family  case  where  the 
densities  f  (y)  are  of  the  form 

n  - 


£  (y)  =  en[n’y-^(Dn 

n  - 


w 


(5.4) 


The  factor  n  in  the  exponent  of  (5.4)  isn't  necessary,  but  is  included  to  agree 
with  the  situation  where  the  data  consists  of  i.i.d.  observations  x^,  x^ ,  . ..,  x^, 
each  with  density  e2  ,  and  y  is  the  sufficient  vector  x^/n. 


Lemma  3.  For  the  exponential  family  (5.4),  formula  (5.3)  gives 


a  = 


$(5:>  CQD 


6v''n  ($(2) (0))3/2 


(5.5) 


where 


^■*(0)  = 


3^  iKh+TU) 


9T-1 


T=0 


(5.6) 


Proof:  We  have 


9l0g 


8x 


=  ny'(y*-^(n))  , 


(5.7) 


T=0 


so  SKEW 


91°g 


equals  the  skewness  of  y'y*  for  y*  ~  f~.  The 


T=0  9t 

fact  that  SKEWQj'y*)  equals  [if^3^  (0)/ (if^2-*  (0))3/2]//n  is  a  standard  exercise 


in  exponential  family  theory.  Note:  Lemma  3  applies  to  y  ~  N^(q,I)  ,  the  case 
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considered  in  Efron  (1984),  and  gives  a  =  0  ,  which  is  why  the  unaccelerated  BC 
intervals  worked  well  there. 

Table  3  relates  to  the  following  example: 

y  ~  N4 Cn,a^ID  [a^  =  l+a(||n||-8)]  ,  (5.8) 

where  we  observe  y  =  (8, 0,0,0)  and  wish  to  set  confidence  intervals  for  the 

parameter  0  =  t Cn)  =  ||r)||.  The  case  a  =  0  amounts  to  finding  a  confidence  in- 

2 

terval  for  the  non-centrality  parameter  of  a  noncentral  x  distribution,  and 
can  be  solved  exactly.  The  theory  of  Efron  (1984)  applies  to  the  a  =  0  case, 
and  we  see  that  the  BCq  interval,  i.e.  the  ordinary  BC  interval  (2.3),  well 
matches  the  exact  interval. 


Exact 

(R/L) 

BC 

a 

(R/L) 

(5.3) 

a  = 

.10 

[6.46,  9.69] 

(.96) 

[6.47,  9.70] 

(.57) 

.0984 

a  = 

.05 

[6.32,  9.57] 

(.85) 

[6.34,  9.56] 

(.84) 

.0498 

a  = 

0 

[6.14,  9.47] 

(.74) 

[6.19,  9.44] 

(.75) 

0 

a  = 

-.05 

[5.92,  9.38] 

(.65) 

[6.03,  9.35] 

(.66) 

-.0498 

a  = 

-.10 

[5.62,  9.30] 

(.56) 

[5.89,  9.27] 

(.60) 

-.0984 

Table  3.  Central  90%  confidence  intervals  for  0  =  ||q||,  having 
observed  ||y||  =  8,  from  the  parametric  family  y  ^  N^(r|,a^I),  with 
=  l+a(||n|| -8) .  The  standard  interval  (1.1)  is  [6.36,9.64]  for 
all  values  of  a.  The  last  column  shows  that  (5.3)  nearly  equals 
the  constant  a  in  this  case.  The  exact  intervals  are  based  on 
the  non-central  x  distribution. 

Table  3  shows  the  result  of  varying  the  constant  a  from  .10  to  -(10, 

This  example  has  a  particularly  simple  geometry:  the  sphere  Cg  =  {ri:  ||ri||=0} 
is  the  set  of  r\  vectors  having  t(n)  equal  to  the  MLE  value  0  =  t(rt)  ;  the 

A  A 

least  favorable  direction  y  is  orthogonal  to  at  n  l  the  distribution  of 

A 

0  is  nearly  normal  (see  Table  2  of  Efron,  1984),  with  standard  deviation  changing 
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at  rate  nearly  equal  a  ,  as  in  (3.10).  The  BC&  intervals  alter  predictably 
with  a.  For  instance  comparing  the  upper  endpoint  at  a  =  .10  with  a  =  0  , 
notice  that  (9 . 70-8 . 00)/ (9 .44-8 .00)  =  1.18  ,  closely  matching  the  obvious  ex¬ 
pansion  factor  due  to  acceleration,  1  +  .10*  1.654  =  1.17. 

We  could  disguise  problem  (5.8)  by  making  non-linear  transformations 

y  =  g(y) »  n  =  h(rp  ,  (5.9) 

in  which  case  the  geometry  of  the  BC  intervals  might  not  be  obvious  from  the 

form  of  the  parameter  0  =  t(h-1(n))  =  ||h"1(p)||  and  the  transformed  densities 

f~fvl.  However  the  BC  method  is  invariant  under  such  transformations,  see 
ri  ~  a 

Remark  C,  Section  10,  so  the  statistician  would  automatically  get  the  same  inter- 

-1  ~  -1a 

vals  as  if  he  knew  the  normalizing  transformations  y  =  g  (y) ,  0  =  h  (n) . 

Currently  we  cannot  justify  the  BC&  method  as  being  second  order  correct 
in  the  multiparameter  context  of  this  section,  though  it  seems  a  likely  conjec¬ 
ture  that  this  is  so.  We  know  that  it  is  so  in  the  one-parameter  case.  Section 
4,  and  in  the  restricted  multiparameter  case  of  Efron  (1984),  where  the  BCa 
and  BC  methods  coincide;  and  that  the  BC  method  makes  a  rather  obvious  cor- 

a 

rection  to  the  BC  interval  in  the  general  multiparameter  case. 

6.  The  Nonparametric  Case. 

This  section  concerns  the  nonparametric  case  where  the  data  y  =  (x^,...,xn) 
consists  of  i.i.d.  observations  xi  which  may  have  come  from  any  probability 
distribution  F  on  their  common  sample  space  X.  There  is  a  real-valued  para¬ 
meter  0  =  t(F)  for  which  we  desire  an  approximate  confidence  interval.  We  will 
show  how  the  BC  method  can  be  used  to  provide  such  an  interval  based  on  the 

cL 

obvious  nonparametric  estimate  0  =  t(F).  Here  F  is  the  empirical  probability 
distribution  of  the  sample,  putting  mass  1/n  on  each  observed  value  x^. 
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The  nonparametric  bootstrap  cdf  G  of  0  is  generated  by  following  steps 

* 

(i)  ,  (ii) ,  (iii)  at  the  beginning  of  Section  2:  each  bootstrap  data  set  y^ 

*  *  *  ^ 
equals  Cxib,x2b3  *  *  ‘  ,xnb^  9  an  sample  of  size  n  from  F.  This  gives  a 

a*  ie 

bootstrap  empirical  distribution  putting  mass  1/n  at  each  ,  and  a 

A*  A* 

corresponding  bootstrap  estimate  0^  =  t(F^).  The  observed  standard  deviation  of 

/V*  ^ 

the  0^  values  is  the  nonparametric  bootstrap  estimate  of  standard  error  for  0  , 
Efron  (1979).  In  this  paper  we  are  pursuing  the  more  difficult  task  of  construct¬ 


ing  approximate  confidence  intervals  from  the  bootstrap  distribution. 

A 

At  this  point  we  could  use  G  to  form  the  BC  interval  (2.3),  but  for  the 


BC  interval  (2.6)  we  also  need  the  value  of  a.  We  will  derive  a  simple  formula 
a 

for  a  ,  based  on  Lemma  3.  It  depends  on 


t ( (1-A) F+AS . ) -t  (F) 

U.  =  lira  - - ,  (i  =  1,2,. ...n)  ,  (6.1) 

1  A-K) 


the  empirical  influence  function  of  0  =  t(F).  Here  6^  is  a  point  mass  at  x^  , 

A 

so  IL  is  the  derivative  of  the  estimate  0  with  respect  to  the  mass  on  point 
x^.  Definition  (6.1)  assumes  that  t(F)  is  smoothly  defined  for  choices  of  F 
near  F  ,  see  Section  (6.3)  of  Efron  (1982),  or  Section  5  of  Efron  (1979).  [Note* 

zV  =  o.] 

The  next  section  shows  that  Lemma  3,  applied  to  a  family  appropriate  to  the 
nonparametric  situation,  gives  the  following  approximation  for  the  constant  a 


a 


Zi=l  Ui 

(E?  .  U2)3/2 
V  1=1  1^ 


(6.2) 


This  is  a  convenient  formula  since  the  Ih  can  be  evaluated  easily  using  finite 
difference  in  definition  (6.1), 

Example  1:  The  law  school  data.  Table  4  shows  two  indices  of  student 
excellence,  LSAT  and  GPA,  for  each  of  15  American  law  schools,  see  Section  2.2 
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1 


1 


(LSAT.GPA)  lh 


1 

(576,3.39) 

-1.507 

2 

(635,3.30) 

0.168 

3 

(558,2.81) 

0.273 

4 

(578,3.03) 

0.004 

5 

(666,3.44) 

0.525 

6 

(580,3.07) 

-0.049 

7 

(555,3.00) 

-0.100 

8 

(661,3.43) 

0.477 

(LSAT,GPA)  Ik 


9 

(651,3.36) 

0.310 

10 

(605,3.13) 

0.004 

11 

(653,3.12) 

-0.526 

12 

(575,2.74) 

-0.091 

13 

(545,2.76) 

0.434 

14 

(572,2.88) 

0.125 

15 

(594,2.96) 

-0.048 

Table  4.  The  law  school  data,  and  values  of  the  empirical 
influence  function  for  the  correlation  coefficient  p. 

of  Efron  (1982) .  The  Pearson  correlation  coefficient  p  between  LSAT  and  GPA 
equals  .776  ;  we  want  a  confidence  interval  for  the  true  correlation  p.  Table 
3  also  shows  the  values  of  lb  for  the  statistic  p  ,  from  which  formula  (6.2) 
produces  a  =  -.0817.  B  =  100,000  bootstrap  replications  (about  100  times  more 
than  actually  needed,  see  Section  8)  gave  =  $  (.463)  =  -.0927  ,  defini¬ 

tion  (2.2).  Taking  a  =  .05  in  (2.6),  (2.7)  resulted  in  the  central  90%  BCa 
interval  [.43,. 92]  for  p.  The  corresponding  bivariate  normal  interval,  based 
on  Fisher's  tanh-1  transformation,  is  [.49,. 90].  The  standard  interval  (1.1), 
p  ±  1.645  3  ,  using  the  bootstrap  estimate  3  =  .133  ,  is  [.56,. 99]. 

Formula  (4.2)  is  invariant  under  monotone  changes  of  the  parameter  of  in¬ 
terest.  This  results  in  the  BC  intervals  having  correct  transformation  pro- 

cL 

perties.  Suppose  for  example  that  we  change  parameters  from  p  to 

■I  A  A 

<)>  =  g(p)  =  tanh  (p)  ,  with  corresponding  nonparametric  estimate  4>  =  g(p). 

The  central  90%  BC  interval  for  <p  based  on  $  is  then  the  obvious  trans- 

3. 

A 

formation  of  the  interval  for  0  based  on  0  ,  [g(.43) ,g(.92)]  =  [.46,1.59]. 
This  compares  with  Fisher's  tanh-1  interval  [ g ( . 49) , g ( . 93) ]  =  [.54,1.47] 
and  the  standard  interval  <J>  ±  1.645  3^  =  [.49,1.59],  The  standard  interval 
is  much  more  reasonable-looking  on  the  tanh  scale,  as  we  might  expect  from 
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Fisher1 s  transformation  theory.  As  commented  before,  a  major  advantage  of  the 

BC  method  is  that  the  statistician  need  not  know  the  correct  scale  to  work  on. 
a 

In  effect  the  method  effectively  selects  the  best  (most  normal)  scale,  and  then 

transforms  the  interval  back  to  the  scale  of  interest. 

Example  2:  Mardia,  Kent,  and  Bitty  (1979),  pages  3,  and  234,  give  5  test 

scores  for  each  of  n  =  88  students.  The  principal  eigenvector  of  the  5><5 

sample  covariance  matrix  accounts  for  0  =  .619  of  the  total  variation,  i.e., 

.619  =  (largest  eigenvalue)/ (sum  of  eigenvalues).  Suppose  we  want  a  central  90% 

confidence  interval  for  the  corresponding  population  parameter. 

Table  5  shows  the  BCa  intervals  based  on  B  =  1000  bootstrap  replications 

Zq  =  .095,  a  =  .0194  (6.2),  so  the  BCa  interval  is  nearly  the  same  as  the  BC 

interval  in  this  case.  Both  are  nearly  the  same  as  the  standard  interval  (1.1). 

(Here  we  have  used  the  bootstrap  standard  error  estimate  .046  rather  than  the 

asymptotic  normal-theory  estimate  .041.)  In  this  case  the  standard  interval  is 

quite  acceptable,  though  this  is  evident  only  after  the  bootstrap  analysis.  For 

a  random  sample  of  22  of  the  88  students,  the  standard  interval  agreed  less  well 

with  the  BC  interval,  B  =  4000  bootstrap  replications, 
a 

_ All  88  Students _  _ Random  22  Students _ 

BC  [.537,  .691]  (R/L=.88)  [.550,. 825]  (R/L=.75) 

a 

Standard  [.543,  .695]  [.574,. 847] 

MLE  .619  .711 

(z0,a,SB)  (-.095, .0194, .046)  (-. 084 ,. 0327 ,. 083) 

Table  5.  Central  90%  approximate  confidence  intervals  for  the  proportion 
of  total  variability  due  to  the  first  principal  component;  test  score  data 
from  Mardia,  Kent,  and  Bibby  (1979).  The  standard  intervals  are  based  on 
the  bootstrap  estimate  of  standard  error. 
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Example  3:  The  mean.  Suppose  F  is  a  distribution  on  the  real  line,  and 
0  =  t(F)  equals  the  expectation  EpX.  The  empirical  influence  function 
th  =  (x.-x)  ,  so  (6.2)  gives 


1  zCxrx)3  1  y5  _  9 

6  [Z(xi-i)2]3>/2  6/ii  u3/2  6v/rT 


(6.3) 


/s  _  h  /s  /s  /\  3 /  2 

Here  =  U(x^-x)  /n  ,  the  h  sample  central  moment,  and  y  =  y  *  t'^e 

sample  skewness.  It  turns  out  also  that  z^  =  y/61/n  in  this  case,  by  standard 
Edgeworth  arguments.  Both  a  and  z^  are  typically  of  order  n 

Because  the  sample  mean  is  such  a  simple  statistic,  we  can  use  Edgeworth 


methods  to  get  asymptotic  expressions  for  the  a-level  endpoint  of  the  BCa  interval: 


e_r  [a]  =  x  +  a{z(c0  +  JL  (2zCa)  +1)  +  C>  (n-1)}  (6.4) 

BCa  6/fT  P 

o  =  ^/n)  •  This  compares  with 

0  [a]=  x  +  9{z(c°  +  -X-  ( z Cc°2  +  1)  +  0  (n"1)}  ,  (6.5) 

BC  6vft  P 

for  the  BC  interval  (2.3),  so  the  BC  intervals  are  shifted  approximately 

a 

(Y/61/n)  z  ^  ^  further  right . 

/\ 

Johnson  (1978)  suggested  modifying  the  usual  t  statistic  T  =  (x-9)/a 
to  Tt  =  T  +  (y/61/n)  (2T^+1)  ,  and  then  considering  T  to  have  a  standard  t  , 
distribution,  in  order  to  obtain  confidence  intervals  for  8  =  E^X.  Section  10 
of  Efron  (1981)  shows  that  this  is  much  like  using  the  bootstrap  distribution  of 

*  *  _  /N  *  _*  _ 

T  =  (x  -x)/a  (rather  than  of  x  -x)  as  a  pivotal  quantity.  Interestingly 

enough,  the  Edgeworth  expansion  of  0T[a]  ,  the  a  endpoint  of  Johnson's  inter- 

val,  coincides  with  (6.4),  The  BC„  method  makes  a  "t  correction"  in  the  case 
— —  —  - .  -  3. 

of  9  =  EpX  ,  but  it  is  not  the  familiar  student’s  t  correction,  which  operates 
at  third  order  in  (1.2),  but  rather  a  second-order  correction,  coming  from  the  cor¬ 
relation  between  x  and  a  in  non-normal  populations.  See  Remark  D,  Section  10. 
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The  author  conjectures  that  the  nonparametric  BC  intervals  will  be  second- 

a 

order  correct  for  any  parameter  0.  There  is  no  proof  of  this,  a  major  difficulty 
being  the  definition  of  second-order  correctness  in  the  nonparametric  situation. 
Whether  or  not  it  is  true,  small-sample  nonparametric  confidence  intervals  are 
far  from  well  understood,  and  should  be  interpreted  with  some  caution: 

Example  4:  The  variance.  Suppose  X  is  the  real  line,  and  0  =  VarpX  , 
the  variance.  Line  5  of  Table  2  shows  the  result  of  applying  the  nonparametric 
BCa  method  to  data  sets  ,..a  x^q  which  were  actually  i.i.d.  samples 

from  a  N(0,1)  distribution.  The  number  .640  for  example  is  the  average  of 

A 

©Be  [ • 05] / 0  over  40  such  data  sets,  B  =  4000  bootstrap  replications  per 
a 

data  set.  The  upper  limit  1.68  •  0  is  noticably  small,  as  pointed  out  by  Schenker 

(1983).  The  reason  is  simple:  the  nonparametric  bootstrap  distribution  of  0* 

has  a  short  upper  tail;  compared  to  the  parametric  bootstrap  distribution  which 
2 

is  a  scaled  x^g  random  variable.  The  results  of  Beran  [19  84]  ,  Bickel  and 
Friedman  [1981],  and  Singh  (1981)  show  that  the  nonparametric  bootstrap  distribu¬ 
tion  is  highly  accurate  asymptotically,  but  of  course  that  isn't  a  guarantee  of 
good  small-sample  behavior.  Bootstrapping  from  a  smoothed  version  of  F  ,  as  in 
Section  5.3  of  Efron  (1982)  alleviates  the  problem  in  this  particular  example. 

7 .  Geometry  of  the  Nonparametric  Case. 

Formula  (6.2),  which  allows  us  to  apply  the  BC  method  nonparametrically, 

a 

is  based  on  a  simple  heuristic  argument:  instead  of  the  actual  sample-space  X 
of  the  data  points  x^  ,  consider  only  distribution  F  supported  on 

A 

X  =  {x^,X2? . . . ,xn)  ,  the  observed  data  set.  This  is  an  n-category  multinomial 
family,  to  which  we  can  apply  the  results  of  Section  5,  Because  the  multinomial 
is  an  exponential  family.  Lemma  3  directly  gives  (6.2). 
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We  will  now  examine  this  argument  more  carefully,  with  the  help  of  a  simple 
geometric  representation.  See  Section  11  of  Efron  (1981)  for  further  discussion 
of  this  approach  to  nonparametric  confidence  intervals. 

A  typical  distribution  supported  on  X  is 

F(w)  :  mass  w^  on  x^  .  (7.1) 

where  w  =  (w, .w~,....w  )  can  be  any  vector  in  the  simplex  5  =  {w:w>0,  £?w.  =  1}. 
The  parameter  0  =  t(F)  is  defined  on  5^  by  0(w)  =  t(F(w)).  The  central  point 
of  the  simplex, 

w°4  =  (l/n.l/n,... ,1/n)  ,  (7.2) 

corresponds  to  F(w  )  F  ,  the  usual  empirical  distribution  ;  0(w  )  =  0  =  t(F)  , 

the  nonparametric  MLE  of  0.  The  curved  surface 

Cg  =  {w  :  0 (w)  =  0(w°)  =  0}  (7.3) 

comprises  those  distributions  F(w)  having  0(w)  =  0.  The  vector  IL  is  ortho- 
gonal  to  at  w^  ,  as  shown  in  Figure  1,  which  follows  from  definition  (6.1) 

of  the  empirical  influence  function.  (U  is  essentially  the  gradient  of  0(w)  at 
,  see  Efron  (1982),  Section  6.3.) 

A 

With  w  unknown,  but  X  =  {x^,...,x  }  considered  fixed,  we  can  imagine 

setting  a  confidence  interval  for  0(w)  on  the  basis  of  a  hypothetical  sample 

★  *  *  iid  ,  , 

x^,  x2,  . . . ,  xn  ^  F(w).  A  sufficient  statistic  is  the  vector  of  proportions 

* 

Pi  =  =3Ci>/n  ,  say  P  =  (Pj ,P2 ,  •  •  •  .pn)  •  with  distribution 

P  ~  Multn(n,w)/n  ,  (w  e  5n)  .  (7.4) 

The  notation  here  indicates  n  draws  from  an  n-category  multinomial,  having 
probability  w^  for  category  n.  We  suppose  that  we  have  observed  P  =  w^  in 
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(7.4),  i.e.  that  the  hypothetical  sample  x^,  .  ..,  equals  the  actual  sample 


xl* 


Distributions  (7.4)  form  an  n-parameter  exponential  family  (5.4),  with  y  =  P, 

p. 

p.  =  log(nw.)+c  ,  and  tJj(p)  =  log(E^  e  x/n)  .  Here  c  can  be  any  constant,  since 


all  vectors  p  +  cl  correspond  to  the  same  probability  vector  w  ,  namely 

.n 

"1 


^i/^n  nj 
w.  =  e  /Et  e  J 


Figure  1.  All  probability  distributions  supported  on  {x1,x2, . . . ,xn)  are  repre¬ 
sented  as  the  simplex  5  .  The  central  point  w°  corresponds  to  the  empirical  dis- 
/v  n  ~ 

tribution  F.  The  curves  indicate  level  surfaces  of  constant  value  of  the  parameter 

9.  In  particular  Cg  comprises  those  probability  distributions  having  9  equal  to 

/\  ^  ^  o 

9(w°)  =  9,  the  MLE.  The  least  favorable  family  F  passes  through  w  in  the  direc¬ 
tion  U,  orthogonal  to  Cg. 
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If  one  accepts  the  reduction  of  the  original  nonparametric  problem  to  (7.4), 
with  observed  value  P  =  w°  ,  then  it  is  easy  to  carry  through  the  least  favorable 

/V  ~  ^ 

family  calculations (5.1) -(5. 2) :  (i)  n  =  0  ;  (ii)  y  =  U  ;  (iii)  is  the  member 

of  (7.4)  corresponding  to  rj+ry  =  tU  ,  namely 

xU.  tU. 

P*  ~  Mult(n,wX)/n  (wT  =  e  VSj.i  e  ^  >  (7.5) 

(iv)  finally,  formula  (6.2)  follows  directly  from  Lemma  3,  by  differentiating 
n  TUi 

ijj(T)  =  log(E^e  •’/n)  (and  remembering  that  ElL  =0.). 

Only  step  (ii),  is  not  immediate,  but  it  is  a  straightforward  consequence  of 
definition  (5.1)  and  standard  properties  of  the  multinomial.  We  have  already  noted 

A 

that  U  is  orthogonal  to  Ca  ,  so  U  is  proportional  to  7  in  (5.1).  However 
'  11'  8  -  „  * 

=  I  -  ,  which  has  pseudo -inverse  I.  Thus  u  is  proportional  to  U. 

~ri  -  n  r  - 

~  /s 

Since  (5.5),  (5.6)  produce  the  same  value  of  a  if  y  is  multiplied  by  any  con- 

A 

stant,  this  effectively  gives  y  =  U. 

An  interesting  case  which  provides  some  support  for  the  nonparametric  BCa 
method  is  that  where  the  sample  space  is  finite  to  begin  with,  say  X  =  {l,2,...,L}. 
A  typical  distribution  on  X  is  f  =  (fx , . . . , f L)  ,  where  f^  =  Prob{xi=£}.  The 
observed  sample  proportions  f  =  (f ^ , ■ . • , -  #{x^=£}/n  ,  are  sufficient, 
with  distribution  f  ~  MultL(n, f)/n.  This  is  an  L-parameter  exponential  family, 
so  the  theory  of  Section  5  applies.  It  turns  out  that  Lemma  3  agrees  with  formula 
(6.2)  in  this  case.  Nonparametric  BC&  intervals  are  the  same  as  parametric  BCa 
intervals  when  X  is  finite.  See  remarks  G  and  H  of  Efron  (1979) ,  for  the  first- 
order  bootstrap  asymptotics  of  finite  sample  spaces. 

Family  (7.4)  was  used  in  Section  11  of  Efron  (1981)  to  motivate  a  method 
called  nonparametric  tilting,  a  nonparametric  analogue  of  the  standard  hypothesis - 
testing  approach  to  confidence  interval  construction.  The  one-parameter  tilting 
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<  ^ 


family,  (11.12)  of  Efron  (1981),  is  closely  related  to  the  least  favorable  family 
in  Figure  1.  Table  5  of  Efron  (1981)  considers  samples  of  size  n  =  15  for 
the  one-sided  exponential  density  f(x)  =  e~^X+1^  ,  x  >  -1.  Central  90%  tilting 
intervals  for  0  =  EpX  were  constructed  for  each  of  ten  such  samples,  averaging 
[-.34, .50].  The  corresponding  nonparametric  BC  intervals  averaged  [-.34,. 52], 
and  were  quite  similar  to  the  tilting  intervals  on  a  sample-by-sample  comparison. 
The  nonparametric  BCa  method  is  computationally  simpler  than  nonparametric  tilt¬ 
ing,  and  seems  likely  to  give  similar  results  in  most  problems. 

We  end  this  section  with  a  useful  approximation  formula  for  the  bias-correc¬ 
tion  constant  zQ  ,  developed  jointly  with  Timothy  Hesterberg.  In  addition  to 
(6.1)  we  need  the  second  order  empirical  influence  function 


V.  . 
ij 


lim 

A-K) 


t((l-2A)F+A6i+A6j)  -  t((l-A)F+A6i)  -  t ( (1-A) F+A6j )  +  t(F) 

_ 


(7.6) 


Define  zQ1  =  (1/6) E^U?/  (E^U?)3^2  (the  right  side  of  (6.2))  and 


rU’VU  n 

=  7T72  -  tr  v  /(2nllull)  ’ 

L  u|  2  ~J 


'02 


(7.7) 


where  V  is  the  n  x  n  matrix  (V„). 

Lemma  4.  The  bias-correction  constant  Zq  approximately  equals 


$_1{2$(z01)$(z02)} 


(7.S) 


For  the  law  school  data.  Example  1,  zQ1  =  -.0817  and  zQ2  =  -.0067  ,  giving 
Zq  =  -.0869  from  (7.8),  compared  to  Zq  =  -.0927  ±  .0039  from  B  =  100,000 
bootstrap  replications . 

A 

The  term  zQ1  relates  to  skewness  in  F  while  zQ2  is  a  geometrical  term 
arising  from  the  curvature  of  Cg  at  w°.  It  is  analogous  to  formula  (A15)  of 
Efron  (1984).  Lemma  4  will  not  be  proved  here,  but  is  referred  to  in  Section  8. 
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8.  Bootstrap  Sample  Sizes. 


How  many  bootstrap  replications  need  we  take?  So  far  we  have  pretended  that 
B  =  00  ,  but  if  Monte  Carlo  methods  are  necessary  to  obtain  the  bootstrap  distribu¬ 
tion  (2.1),  then  B  must  be  finite,  usually  the  smaller  the  better.  This  section 
gives  rough  estimates  of  how  small  B  may  be  taken  in  practice.  The  results  are 
presented  without  proof,  all  being  standard  exercises  in  error  estimation,  see  for 
instance  Chapter  10  of  Kendall  and  Stuart  (1958) . 

First  consider  the  easy  problem  of  estimating  the  standard  error  of  0  via 
the  bootstrap.  The  bootstrap  estimate  based  on  B  replications, 

$B  =  [E^_1(0^-0 *)2/(B-1)]Js  ,  has  conditional  coefficient  of  variation  (standard 
deviation  divided  by  expectation) 

A 

CV{aB|y}  ±  [££]*  ,  (8.1) 


where  6  is  the  kurtosis  of  the  bootstrap  distribution  G.  The  notation  indicates 
that  the  observed  data  y  is  fixed  in  this  calculation.  As  B  -*  °°,  then 
(8.1)  -+•  0  and  5g  -*■  cr~,  the  ideal  bootstrap  estimate  of  standard  error. 

Of  course  a  itself  will  usually  not  estimate  the  true  standard  error 

A 

a  =  SDq{0>  perfectly.  Let  CV(a)  be  the  coefficient  of  variation  of  a  ,  un- 
0 

conditional  now,  averaging  over  the  possible  realizations  of  y.  [For  example  if 
n  =  20,  0  =  x  ,  x±  1id  N(0, 1)  ,  then  CV(ct)  =  (1/40)^  =  .16.]  The  unconditional 
CV  of  aD  is  then  approximated  by 

D 


CV(aB)  =  [CV2 (a)  + 


4B~J 


(8.2) 


Table  6  displays  CV(5g)  for  various  choices  of  B  and  CV(a)  ,  assuming 
E6  =  0.  For  values  of  CV(o)  >_  .10  ,  typical  in  practice,  there  is  little  im¬ 
provement  past  B  =  100.  In  fact  B  as  small  as  25  gives  reasonable  results. 
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B  + 


25 

50 

100 

200 

OO 

CV(a) 

.25 

.29 

.27 

.26 

.25 

.25 

4 

.20 

.24 

.22 

.21 

.21 

.20 

.15 

.21 

.18 

.17 

.16 

.15 

.10 

.17 

.14 

.12 

.11 

.10 

.05 

.15 

.11 

.09 

.07 

.05 

0 

.14 

.10 

.07 

.05 

0 

A 

Table  6,  Coefficient  of  variation  of  aR,  the  bootstrap 

estimate  of  standard  error,  as  a  function  of  B,  the  number 
bootstrap  replications,  and  CV(a),  the  limiting  CV  as  B-*-°° 
Based  on  (8.2).  assuming  E6  =  0. 


Now  we  return  to  bootstrap  confidence  intervals.  Let  0R[a]  be  the  a-end- 
point  of  either  the  BC  or  BCa  interval  obtained  from  B  bootstrap  replica¬ 
tions  (either  parametrically  or  nonparametrically) .  Let  0[a]  =  lim  0R[a] .  The 

B~*°° 

following  formula  for  the  conditional  CV  of  0R[a]-0[a]  assumes  that  the  bootstrap 

A 

cdf  G  is  roughly  normal,  and  that  zQ  and  a  are  known,  for  example  from 
(7.8)  and  (5.3)  or  (6.2) : 


CV{9n[a]  -  0[a] | y} 


.(a) 


B  z 


a(l-a) 


<()  (z  (C1') )  2  1 


2 

(Jj(z)  =  e  //2tt. 


Here  is  a  brief  tabulation  of  (8.3)  x  B2  , 


(8.3) 


a  :  .75  .90  .95  .975 

- j -  .  (8.4) 

(8.3)xB^:  4.08  1.78  1.65  1.86 


If  B  =  1000  for  instance,  then  CV{0g  [.95  ]-0  [.95  ]|y}  =  1.65/10002  -  .052.  Re¬ 
ducing  B  to  250  increases  the  conditional  CV  to  .104.  This  last  figure  will 
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often  be  too  big.  The  whole  purpose  of  developing  a  theory  better  than  (1.1)  is 
to  capture  second  order  effects.  As  our  examples  have  indicated,  these  become  in¬ 
teresting  when  the  asymmetry  ratio  R/L  is  larger  than  say  1.25  ,  or  smaller  than 
.75.  In  such  borderline  situations,  an  extra  10%  error  in  each  tail  due  to 
inadequate  bootstrap  sampling  is  unacceptable. 

If  the  bias-correction  constant  zQ  is  estimated  by  Monte  Carlo  directly 
from  (2.2),  rather  than  from  (7.8),  then 


CV{0B [a] -0 [a] |y} 


1 

Cl 

2(1 -a) 

B*  z^l 

|$(0)2 

<KO)<f>(z(o°) 

+ 


a(l-  )  \ 

<Kz(o0)2J 


(8.5) 


for  a  >  .50.  This  gives  larger  CVs  than  (8.3), 

a  :  .75  .90  .95  .975 


- 5 -  .  (8.6) 

(8.5)xB^:  9.23  3.87  3.07  2.94 

Comparing  (8.6)  with  (8.4)  shows  that  we  need  B  to  be  about  four  times  larger 
to  get  the  same  CV  if  zQ  is  estimated  rather  than  calculated.  Formula  (7.8)  can 
be  very  helpful ! 

A 

Both  (8.3)  and  (8.5)  assume  that  the  bootstrap  cdf  G  is  estimated  by 
straightforward  Monte  Carlo  sampling,  as  in  (2.1).  Professor  M.  V.  Johns  (per¬ 


sonal  communication)  has  developed  importance  sampling  methods  which  greatly 

A 

accelerate  the  estimation  of  G  in  some  situations. 


9.  One-Parameter  Families. 

We  return  to  the  simple  situation  0  ~  fQ  ,  where  there  are  no  nuisance  pa¬ 
rameters,  and  where  we  want  a  confidence  interval  for  the  real -valued  parameter 

/v 

0  based  on  a  real-valued  summary  statistic  0.  This  section  gives  a  more  exten¬ 
sive  discussion  of  the  acceleration  constant  a  ,  which  has  played  a  basic  role  in 
our  considerations.  Three  familiar  types  of  one-parameter  families  will  be  in¬ 
vestigated:  exponential  families,  translation  families,  and  transformation  families. 
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Efron  (1982A)  considers  the  following  question:  for  a  given  family  0  ~  fg  , 
do  there  exist  mappings  $  =  g(0),  (f>  =  h(0)  such  that  $  =  (Jj+o^  q(Z),  Z  ^  N(0,1)  , 
as  in  (3.3)?  This  last  form,  a  General  Scaled  Transformation  Family,  generalizes 

A 

the  concept  of  the  ideal  normal  i zation ,  where  cj)  =  <f)+Z . 

The  question  above  is  answered  in  terms  of  the  diagnostic  function 
D (z, 0)  =  [<f>(0)/ct>(z)]  [^0 ^ )/F0 (yQ) ]  •  Here  <J>(z)  is  the  standard  normal  density 
C2tt)  5  e"z  ;  FQ  is  the  cd£  FQ(s)  =  Prob0{0<s};  FQ(s)  =  F0(s);  a  =  $(z)  ; 

00a^  is  the  100*a  percentile  of  0  given  0,  00a^  =  F^Ca)  ;  and  y0  is  the 

median  of  0  given  0  ,  y0  =  00*-'5^  =  F”1  ( .  5)  .  It  is  shown  that  the  form  of  a ^ 

and  q(z)  in  (3.3)  can  be  inferred  from  D(z,0)  ,  the  main  advantage  being  that 
D(z,0)  is  computed  without  knowledge  of  the  normalizing  transformations  g,  h. 

The  connection  of  transformation  family  theory  with  the  acceleration  constant 

a  is  the  following:  define 


^D(z,0) 


z=0 


(9.1) 


If  q(z)  in  (3.3)  is  symetrically  distributed  about  zero,  a  situation  called  a 
Symmetric  Scaled  Transformation  Family  (SSTF) ,  then 


e 


0 


(9.2) 


see  (4.11)  of  Efron  (1982A) .  A  more  complicated  relationship  holds  for  the  GSTF 
case. 


Notice  that  (9.2)  is  quite  close  to  our  original  description  of  "a"  as  the 
rate  of  change  of  standard  deviation  on  the  normalized  scale.  As  a  matter  of  fact, 
we  can  transform  (2.4),  (2.5)  into  an  SSTF  by  considering  the  statistic 

z. 


* =  <*> +  rab  a$ =  * +  rdr~ (1+a$)  * 


(9.3) 


'0 


'0 


instead  of  <{>  itself.  Then  it  is  easy  to  show  that 
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(9.4) 


*  =  ♦  +  ^1+£0«Z  (£0  =  ’ 

an  SSTF  with  cr^  =  l+eQ<J),  =  eQ  for  all  4).  [The  quantity  eQ  has  the  same 
definition  in  (9.4)  as  in  (3.6).] 

Example.  For  0  ~  Gx^g/19  as  in  Table  2,  eQ  =  .1090  for  all  0  (using 
(9.6)  below).  Also  zQ  =  $_1  Prob{xZg<19}  =  .1082.  The  relationship 
a  =  £q/(1+£qZq)  obtained  by  solving  for  a  in  (9.4)  gives  a  =  .1077  ,  the  value 
used  in  Table  2. 

We  show  below  that  under  reasonable  asymptotic  conditions, 

SKEVV  •  _  (9.5) 

6  £e  * 

where  eQ  =  D(z>9)lz=g  as  an  (9-l)-  This  last  definition  of  £0  can  be  eval¬ 
uated  for  any  family  0  ~  fQ  ,  assuming  only  that  the  necessary  derivatives  exist. 
The  point  here  is  that  SKEWQ(Jl0)/6  always  approximates  £0  (9.1),  and  in  SSTF 

families  £0  has  the  acceleration  interpretation  (9.2). 

Now  to  show  (9.5).  It  is  possible  to  reexpress  (9.1)  as 


e0 


4>(0) 

^0  w 


VV  * 


(9.6) 


•  J 

where  yg  =  yg  ,  the  rate  of  change  of  the  median  yg  with  respect  to  0. 

A 

For  notational  convenience  suppose  that  0=0.  Instead  of  0  ,  consider  the 
statistic  X  =  &g(0)/ig  ,  where  iQ  equals  the  Fisher  information  EQJIO(0)  . 

The  parameter  £0  is  invariant  under  one-to-one  changes  of  statistic,  so  we  can 

.  x  *X  X  X 

evaluate  the  right  side  of  (9.6)  in  terms  of  X,  £0  =  -<()(0)^g(ye)/ygfg(ye)  . 

X  .  -h 

For  0  =  0,  X  has  expectation  E0X  =  0  and  standard  deviation  aQ  =  iQ  : 
also  Jig (0)  =  0  ,  since  X  =  0  implies  0  =  0  is  a  solution  of  the  MLE  equation. 
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Assuming  the  usual  asymptotic  convergence  properties,  as  in  (4.4),  (4.6),  we  have 

the  following  approximations:  uQ  =  1 ;  pQ  =  -yQiQ2/6  ;  f0(p0)  = 

•  x  x  •  x 

Ap(y  )  =  -/i^  Yq/6.  These  are  derived  from  standard  Edgeworth  and  Taylor  series 
arguments,  which  won't  be  presented  here.  Taken  together  they  give 
gq  =  SKEWq (5-q) / 6  =  SKEW0(^0)/6  ,  which  is  (9.5).  The  quantity  SKEW0(£0)/6  is 
0(n  2)  ,  and  the  error  of  approximation  in  (9.5)  is  quite  small. 


£ 


0 


SKEWQ(£0) 

6 


[ 1+0 (n-1)  ]  . 


(9.7) 


Approximation  (9.5)  is  particularly  easy  to  understand  in  one-parameter  ex¬ 
ponential  families.  Suppose  x^,  x 2*  . ..,  x^  are  i.i.d.  observations  from  such 
a  family,  with  sufficient  statistic  y  =  x  having  density  fg(y)  =  e11^  ^^f( 
In  this  case  formula  (9.6)  becomes 


CTq  <K0) 
•Y.Y ,  Y, 

yefe 


",  Y  Y"> 

xe '  ye 


0 


(9.8) 


both  of  the 
Since 


Y  Y  *Y  Y 

where  =  E0{y},  y0  =  median0{y},  y0  =  9y0/90  ,  etc.  The  term 

[(Xq-p^/Oq]  =  yj/eu+o^"1)]  ,  while  Cg4.(0)/yJfg(pJ)  =  1  +  0(n_1) 

Y  -1 

calculations  being  quite  straightforward.  Thus  g0  =  y0/6[l+O(n  )] 

£0(y)  =  n[y-XQ]  ,  we  have  SKEWQ(£e(y))  =  SKEWQ(y)  =  y0  ,  verifying  (9.5)  for 
one-parameter  exponential  families . 

Example.  If  Y  ~  Poisson(0),  0  =  15  ,  then  SKEW0(£Q)/6  =  l/(6-03s)=  .0430. 

For  the  continued  version  of  the  Poisson  family  used  in  Efron  (1982A) , 

-|-D(z,0)|  _n  =  .0425  for  0  =  15. 
oZ  z— u 

A 

Translation  Families,  Suppose  we  observe  a  translation  family  £  =  £+W  as 

in  (2.9).  Express  W  as  a  function  q(Z)  of  Z  -  N(0,1)  ,  for  simplicity 

-1  ^ 

assuming  q(0)  =  0  and  q'(0)  =  1  as  in  Efron  (1982A) .  Then  z0  =  $  Prob{£<£}  =  ( 

In  this  case  it  looks  like  methods  based  on  the  percentiles  of  the  boostrap 


cy). 
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distribution  must  give  wrong  answers,  since  if  W  is  long-tailed  to  the  right 
then  the  correct  interval  (2.10)  is  long-tailed  to  the  left,  and  vice-versa.  How¬ 
ever  the  BC  method  produces  at  least  roughly  correct  intervals,  as  we  saw  in 
a 

the  proof  of  Lemma  1. 

What  happens  is  the  following:  for  any  constant  A  the  transformation 
gA(t)  =  (eAt-l)/A  gives  $  =  gA(S) ,  <J>  =  gA(C)  and  ZA  =  gA(W)  satisfying 

J  =  <l>+cyZA  (aj  =  1+A(j))  .  (9.9) 

2 

The  Taylor  series  for  W  =  q(Z)  begins  W  =  Z+(Yw/6)Z  +...  where 

yw  =  SKEW (W) .  Then  ZA  =  Z+(yw/6)2Z2+(A/2)Z2  +...  . 

3 

The  choice  A  =  a  =  -y„,/3  results  in  Zo  =  Z+cZ  +...  ,  the  quadratic  term 

W  3- 

cancelling  out;  Z  is  then  approximately  normal,  so  (9.9)  is  approximately 
a 

situation  (2.4),  (2.5),  with  zQ  =  0,  a  =  -yw/3.  But  we  know  that  the  BCa  inter¬ 
vals  are  correct  if  we  can  transform  to  situation  (2.4),  (2.5).  An  application  of 
Lemma  2,  assuming  Za  ~  N(0,1)  ,  shows  that  a  =  -yw/3  =  SKEW (A  (?))/6  for  the 
translation  family  2,  =  £+W  ,  reverifying  (3.1).  [If  Z&  ~  N(0,1)  in  (9.9)  then 
a  must  equal  e  ,  the  constant  value  of  z  ,  (9.1),  for  the  translation  family 

A  , 

Z  =  £+W  ;  one  can  show  directly  that  e  =  "Y^/3  for  such  a  family.] 

In  the  example  0  ~  ey^g/19  ,  the  two  constants  zQ  and  a  are  nearly  equal. 
This  is  no  fluke: 

Lemma  5.  If  0  is  the  MLE  of  0  in  a  one-parameter  problem  having  standard 
asymptotic  properties  (4.4)  or  (4.6),  then  Zq  =  a  , 

^  SKEW  (L)  , 

zQ  =  ProbQ{0<0}  = - I — [i  +  oCn  a)3  .  (9.10) 

Proof:  We  follow  the  notation  and  results  of  DiCiccio  (1984):  thus  ^^2*^ 
equal  the  first  three  c  mulants  of  under  0;  k^*  ^q3  t*ie  first  three 
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cumulants  of  &0;  kQ01  ,  the  first  cumulant  of  £Q  ;  and  k1;L  =  cov0(£0,&0).  (So 

v  =  i  f  the  Fisher  information.)  All  cumulants  are  assumed  to  be  0(n).  Then 
2  0 

A 

the  relative  bias  of  0  is 


E0(e-9) 

Var0(0)% 


"001 


2k 


6k 


3/2 


-3-  *  0(n-3/2) 


(9.11) 


while  9  has  skewness 


Yo = 


k001  k3  -3/2. 

T3 7T-  +  0(n  } 


Both  b  and  y0  are  0  (n~ 2)  . 

Standard  Edgeworth  theory  now  gives 


(9.12) 


Prob0{0<0} 


$(_b)  -  I  chCb)  (b2-l)  +  0(n'3/2) 


(^k3_k001^  +  ^k001  k3^  -3/2 

.5  +  <K0)  - 3  UU1  +  0(n 


6k 


3/2 


=  .5  +  $(0)  — +  0(n'3/2) 
6k^7  ^ 


Since  SKEW0(£0)  =  k0/k3//2  ,  this  verifies  (9.10).  □ 

In  multiparameter  problems  it  is  no  longer  true  that  zQ  =  a.  The  geometry 
of  the  level  surface  Cg  adds  another  term  to  zQ  ,  as  in  (7.8). 


10.  Remarks. 


Remark  A.  Suppose  that  instead  of  (2.4),  (2.5)  we  have  =  a0(l+A<!>), 

^  rcn  _  .  fen  fo) 

/l.  The  transformations  <jA  =  <|>/cr0,  <J)  =  ^/^q  .  give  cj>  =  (f)  +a  ^q-^Z-Zq’)  , 


u0 

.  (0)  .  .(0) 
where  a  =  l+atf> 

(j)  ^  ' 


and  a  =  Aa0  ,  so  we  are  back  on  form  (2.4),  (2.5). 


Notice  that  the  derivative  d(a(j)/aQ)/d(<|)/a0)  =  a  ,  as  in  (3.10).  In  a  similar 
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way  we  can  transform  (2.4),  (2.5)  so  that  a,  =1  at  any  point  <j>_  ;  the  result¬ 
ed  *  u 

ing  value  of  a  satisfies  (3.10). 

Remark  B.  Instead  of  using  <j>  to  estimate  <p  in  (2.4),  (2.5)  we  might 

-(c)  _  - 

change  to  the  estimator  (p'-  J  =  <P~cOq  ,  for  some  constant  c.  It  turns  out  that 
we  are  still  in  situation  (2.4),  (2.5:  =  <t>+afC^  (Z-ZqC-*)  where 


a^c)  =  1  +  a' JC)) 


(<f> 


(C) 

0 


c/  (1-ac) )  , 


(10.1) 


and  a ^  =  a(l-ac),  z ^  =  zQ+(p^  .  The  choice  c  =  -zQ/(l-az0)  gives  =  0  , 

as  in  (9.3),  (9.4).  The  choice  c  =  a  gives  approximately  the  MLE  of  <}>.  In¬ 
terestingly  enough,  the  BC„  interval  for  <J>  based  on  is  the  same  for  all 

-  Cl  - -  — 111  1  —  ■ — ■ — — - 

choices  of  c.  Minor  changes  in  the  choice  of  estimator  seem  to  have  little  effect 

on  the  BC  intervals  in  general,  though  for  computational  reasons  it  is  best  not 
a 

to  use  very  biased  estimators,  having  large  values  of  Zq. 

A 

Remark  C.  Section  5  uses  the  MLE  0  =  t(fj).  This  has  one  major  advantage 

A 

the  BC„  interval  for  0  ,  based  on  0  ,  stays  the  same  under  all  multivariate 

-  cL  —  ~  ~ "  ■  ■  ■■  ■  "  •  ■  .  - - - - - - - — — ■■  -  .  ■  ■ 

transformations  (5.9).  Stein  (1956)  notes  that  the  least  favorable  direction  y 
transforms  in  the  obvious  way  under  (5.9),  y  =  Dy  ,  where  D  is  the  matrix  with 
ijth  element  9fjj/3n^ | ,  from  which  it  is  easy  to  check  that  formula  (5.3)  is 
invariant:  the  constant  a  is  assigned  the  same  value  no  matter  what  transfor- 

A 

mations  (5.9)  are  applied.  The  bootstrap  distribution  G  is  similarly  invariant, 
as  shown  in  Efron  (1984),  and  so  is  z^.  This  implies  that  the  BCa  intervals 
are  invariant  under  transformations  (5.9), 

Remark  D.  The  multiparametric  theory  of  Section  5  gives  an  interesting 
result  when  applied  to  location-scale  families,  y  =  (x,s),  p  =  (0fa)  ,  and  the 
family  of  densities  f  (y)  has  the  form 


^  ^  1  -  ,x-0  s. 

f0,cr  x,s^  =  a2  f01(  a  ’  ’ 


(10.2) 


f0i(x,s)  being  a  known  bivariate  density  function. 
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Suppose  we  wish  to  set  a  confidence  interval  for  the  location  parameter  0  . 

a 

on  the  basis  of  its  MLE  0.  Parametric  bootstrap  intervals  are  based  on  the  dis¬ 
tribution  of  0  when  sampling  from  fg  ~(x  ,s  ) .  The  BC  interval  essentially 
amounts  to  pretending  that  a  is  known  (and  equal  to  a)  in  (10.2),  and  that 
we  have  only  a  location  problem  to  deal  with,  rather  than  a  location-scale  problem. 

In  contrast,  the  BC  interval  takes  account  of  the  fact  that  a  is  unknown. 

3 

In  particular  the  least  favorable  direction  y  ,  plotted  in  the  (0,cr)  plane, 

is  not  parallel  to  the  0  axis.  It  has  a  component  in  the  a  direction,  whose 

magnitude  is  determined  by  the  correlation  between  x  and  s.  This  means  that 

Stein's  least  favorable  family  (5.2)  does  not  treat  a  as  a  constant. 

Table  7  relates  to  the  following  choice  of  f01<X.s)  : 

2 

x~-£-  1  ,  s  |  x  ~  (1+x)  (x^4/14)  ^  ,  (10.3) 

the  two  x  variates  being  independent.  This  is  a  computationally  more  tractable 
version  of  the  problem  discussed  in  Tables  4  and  5  of  Efron  (1981) .  Approximate 
central  90%  intervals  are  given  for  0  ,  having  osberved  (x,s)  =  (0,1).  For  any 
other  observed  (x,s)  the  intervals  transform  in  the  obvious  way,  0xs[a]=x+s0Q1[a] . 
Line  3  shows  the  exact  interval,  based  on  inverting  the  distribution  of  the  pivo¬ 
tal  quantity  T  =  (0-0)/a  for  situations  (10.2),  (10.3). 

1.  BC  interval:  [-.336,. 501]  (R/L)  =  1.49 

2.  BC  interval:  [-.303,. 603]  (R/L)  =  1.99 

3 

3.  T  interval:  [-.336,-670]  (R/L)  =  1.99 

Table  7.  Central  90%  intervals  for  0,  having  observed 
(x, s)  =  (0,1)  from  the  location-scale  family  (10.2),  (10.3). 

Line  3  is  based  on  the  actual  distribution  of  the  pivotal 

qusntity  T  =  (0-9)/a.  The  observed  MLE  vslues  3re  0  =  0t 

a  =  .966. 
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In  this  case  the  BC  method  makes  a  large  "second-order  t  correction",  as 

a 

in  Example  3  of  Section  6,  shifting  the  BC  interval  a  considerable  ways  right- 
ward,  and  acheiving  the  correct  R/L  ratio.  The  length  of  the  BCa  interval  is 
90%  the  length  of  the  T  interval.  This  deficiency  is  a  third-order  effect,  in 
the  spirit  of  the  familiar  student's  t  correction.  It  arises  from  the  varia¬ 
bility  of  a  as  an  estimate  of  a  ,  rather  than  the  second-order  effect  due  to 
the  correlation  of  a  with  0. 

2 

Remark  E.  Section  2  says  that  the  family  y  ~  0x19  can  be  mapped  into 

form  (2.4),  (2.5).  What  are  the  appropriate  mappings?  It  simplifies  the  problem 

yv  2  .2 

to  consider  the  equivalent  family  0  -  ®tX^g/cQ)  where  Cq  =  18 . 3337  =  median(x^g)  . 

~  ys  2  , 

Then  £  =  g1  (0) ,  C  =  g1 (0)  ,  W  =  Si^g/Cg)  *  give  a  translation  family  (2.9),  with 
median (W)  *  0  ,  for  any  mapping  gj(t)  =  (log  t)/c^.  Choosing  c^  =  .3292 
results  in  W  =  q(Z)  having  q(0)  =  0,  q  (0)  =  1  ,  as  in  Section  9's  discussion 
of  translation  families. 

At 

Section  9  suggests  normalizing  a  translation  family  by  g^(t)  =  ' 

a  good  choice  for  A  being  the  constant  eQ  ,  (9.1),  which  equals  .1090  for 

~  2 

all  0  in  the  family  0  ~  0(x19/cq)-  The  combined  transformation 

771  1  ^ 

g(t)  =  g,  (g,(t))  is  g(t)  =  9 . 1746 [t ’  -  1].  The  transformed  family  <)>  =  g(0), 

A  -L 

<j>  =  g(0)  is  of  form  (2.4),  (2.5), 

I  -2  .3311 

$  =  4>  +  (1+.1090*<)))Z  (z  =  9.1746^^-j  -  l|  |  .  (10.4) 

Numerical  calculations  verify  that  Z  as  defined  in  (10.4)  is  very  close  to  a 

standard  normal  variate.  In  fact  we  have  automatically  recovered,  nearly,  the 

Wilson-Hilferty  cube  root  transformation,  Johnson  and  Kotz  (1970).  Using  (10.4), 

it  is  not  difficult  to  show  that  g(t)  as  defined  above  gives  approximately 

~  2 

(2.4),  (2.5)  when  applied  to  the  family  0  ~  0(x19/19)  considered  in  Section  2, 
with  constants  z^  and  a  as  stated. 
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11.  Proof  of  Theorem  1. 


A  monotonic  mapping  $  =  g(0)  ,  (f>  =  g(0)  transforms  the  exact  confidence  in¬ 
terval  in  the  obvious  way,  <J>EX[a]  =  g(0£X[a])  ,  and  likewise  for  the  BCa  inter¬ 
val.  By  using  such  a  mapping  we  can  always  make  (f)  =  0  and  the  distribution  of 

A 

(f>  given  cj)  =  0  perfectly  normal.  Because  of  (4.6),  which  says  that  the  dis- 

tributions  of  0  are  approaching  normality  at  the  usual  0(n  2)  rate,  the  norma- 

2  3 

lizing  transformation  g  is  asymptotically  linear,  g(0)  =  0+c2@  +c^0  +..., 

c2  =  0(n“^) ,  c3  =  0(n-1) . 

A 

We  will  assume  that  the  problem  is  already  in  the  form  0=0,  with  the  cdf 
of  0  for  0=0  normal,  say 


G 


0 


N(-z 


0 


,D 


(11. 1) 


I  ^ 

Here  zQ  =  $  PQ{0<O}  must  be  included  because  it  is  not  affected  by  any  monoto- 

nic  transformations  zQ  =  yQ/6  is  0(n  2)  by  (4.6).  A  simple  exercise  using 
the  mean  value  theorem  of  calculus  shows  that  if  (4.7)  is  true  in  the  transformed 
problem  (11.1),  then  it  is  true  in  the  original  problem. 

Assuming  (4.6),  0=0,  and  (11.1)  we  will  show  that  the  exact  interval  has 

endpoint 


0Ex[a]  4 


z0  +  z 


(a) 


0  ,  (a), 3 

+  I"  (zo+z  } 


(11.2) 


compared  to 


0BC  “ 
a 


zo  +  z 


(a) 


Vzo+z(c0) 


(11.3) 


for  the  BC  interval.  In  this  section  the  symbol  indicates  accuracy  through 
a 

-1  -3/2 

0(n  )  ,  with  errors  0(n  ).  Then 
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0BC  ^a^_0Ex'-a-' 


0 


+  (z(o°2-l)}  -  (zn+z^f  ,  (11.4) 


eBC  [ot]  ^  pO  T  T  ^  "  T  v"0' 

a 


-i. 


which  is  0(n  )  as  claimed  in  Theorem  1. 

The  proof  of  (11.2)  begins  by  noting  that  (11.1)  implies  Bq  =  -zQ,  aQ  =  1, 
Y0  =  0,  6q  =  0.  Then  (4.6)  gives 


Eo0  =  0+6q  =  (l+B.)0-zn  ,  an  =  l+an0+an02/2 


-  «-m0  -  ■vqj''-- 0  .  -e  ~(T 

Y0  =  y0e  ,  ,  60  =  0  , 

A 

for  0  =  0(1).  The  100*a  percentile  of  0  given  0  is 


(11.5) 


0?°  =  (9+30)  +  a0{z(a)  +  ^  (z(a)2-l)} 


0 


(11.6) 


=  [(l+60)0-zo]  +  [l+aQ0  +  ^-02][z(a:)  +  (z(o°2-l)]  , 


^  Cot') 

using  a  Comish-Fisher  expansion  and  (11.5).  However  the  0  that  has  00  =0 
is  by  definition  0  [1-a] .  Solving  the  lower  expression  in  (11.6)  for  0  ,  and 
substituting  1-a  for  a  ,  gives  (11.2). 

The  proof  of  (11.3)  follows  from  (2.6),  (2.7),  and  (11.1),  (which  says  that 
G  ~  N (-Zq, 1) )  if  we  can  establish  that  a  =  aQ(l+0(n  1)).  In  fact  we  show  below 
that 


a0(l+O(n-1))  for  0  =  0(n  2)  , 


(11.7) 


which  combines  with  a  =  eQ/(l+e0z0)  =  e0(l+0(n-1))  to  give  the  required  result. 
Formula  (11.7)  follows  from  (11.5),  which  gives  the  simpler  expressions 

E00  =  0-zo,  Oq  =  l+ao0,  Y0  =  0,  60  =  0  (11.8) 

1  ^ 

for  0  =  0(n"^).  The  cdf  of  0  given  0  is  calculated  to  be 


42 


(11.9) 


0  ,2 


G0(6)  =  <t>(ze)  ze  -  -g-  (z0-l)  , 


zQ  =  (0-0-0Q)/a0,  zQ  =  3Q-  zq-  Straightforward  expansions  give 

(a)  .  1  +  V(°°  +  ^0  +  (Y0/6)Cz('a)2-1) 


D(zw,0)  = 


1  +  Bq  -  V6 


(11.10) 


from  which  e0  =  D(z,0) | z=Q  =  aQ/ (I+Bq-Yq/6)  ,  verifying  (11.7),  (11.3),  and 

the  main  result  (11.4). 

The  proof  that  0D„  [a]  also  matches  the  Cox-McCullogh  formula  (4.8)  is  si- 
BCa 

milar  to  the  proof  of  Theorem  1,  and  won't  be  presented  here.  The  main  step  is  an 
expression  for  0gC  [a]  involving  Lemma  5, 

0BC  [a]  =  z(°°  +  (k3/6k^/2){z(a:)2+l}  +  (k3/6k^/2)2{2z(a)+z(a)3}  .  (11.11) 

a 
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